Overview

Dataset statistics

Number of variables15
Number of observations16281
Missing cells0
Missing cells (%)0.0%
Duplicate rows5
Duplicate rows (%)< 0.1%
Total size in memory1.9 MiB
Average record size in memory120.0 B

Variable types

NUM13
BOOL2

Warnings

Dataset has 5 (< 0.1%) duplicate rows Duplicates
workclass has 963 (5.9%) zeros Zeros
education has 456 (2.8%) zeros Zeros
marital-status has 2190 (13.5%) zeros Zeros
occupation has 966 (5.9%) zeros Zeros
relationship has 6523 (40.1%) zeros Zeros
capital-gain has 14958 (91.9%) zeros Zeros
capital-loss has 15518 (95.3%) zeros Zeros
native-country has 274 (1.7%) zeros Zeros

Reproduction

Analysis started2021-02-01 12:56:35.208525
Analysis finished2021-02-01 12:57:58.792813
Duration1 minute and 23.58 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

age
Real number (ℝ≥0)

Distinct73
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.767459
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T13:57:59.218296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile64
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.84918681
Coefficient of variation (CV)0.3572374143
Kurtosis-0.2205809761
Mean38.767459
Median Absolute Deviation (MAD)10
Skewness0.5545794063
Sum631173
Variance191.7999754
MonotocityNot monotonic
2021-02-01T13:57:59.702191image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
354612.8%
 
334602.8%
 
234522.8%
 
364502.8%
 
384372.7%
 
314372.7%
 
414272.6%
 
324252.6%
 
374222.6%
 
304172.6%
 
Other values (63)1189373.0%
 
ValueCountFrequency (%) 
172001.2%
 
183121.9%
 
193412.1%
 
203602.2%
 
213762.3%
 
ValueCountFrequency (%) 
90120.1%
 
892< 0.1%
 
883< 0.1%
 
872< 0.1%
 
852< 0.1%
 

workclass
Real number (ℝ≥0)

ZEROS

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.873533567
Minimum0
Maximum8
Zeros963
Zeros (%)5.9%
Memory size127.2 KiB
2021-02-01T13:58:00.207814image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median4
Q34
95-th percentile6
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.480682441
Coefficient of variation (CV)0.3822562567
Kurtosis1.564087531
Mean3.873533567
Median Absolute Deviation (MAD)0
Skewness-0.740161929
Sum63065
Variance2.192420492
MonotocityNot monotonic
2021-02-01T13:58:00.520515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
41121068.9%
 
613218.1%
 
210436.4%
 
09635.9%
 
76834.2%
 
55793.6%
 
14722.9%
 
87< 0.1%
 
33< 0.1%
 
ValueCountFrequency (%) 
09635.9%
 
14722.9%
 
210436.4%
 
33< 0.1%
 
41121068.9%
 
ValueCountFrequency (%) 
87< 0.1%
 
76834.2%
 
613218.1%
 
55793.6%
 
41121068.9%
 

fnlwgt
Real number (ℝ≥0)

Distinct12787
Distinct (%)78.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189435.6778
Minimum13492
Maximum1490400
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T13:58:00.962205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum13492
5-th percentile40641
Q1116736
median177831
Q3238384
95-th percentile378922
Maximum1490400
Range1476908
Interquartile range (IQR)121648

Descriptive statistics

Standard deviation105714.9077
Coefficient of variation (CV)0.5580517298
Kurtosis5.739969535
Mean189435.6778
Median Absolute Deviation (MAD)60904
Skewness1.422954131
Sum3084202270
Variance1.11756417e+10
MonotocityNot monotonic
2021-02-01T13:58:01.475675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
13698690.1%
 
1902908< 0.1%
 
1258928< 0.1%
 
2034888< 0.1%
 
1276518< 0.1%
 
1173108< 0.1%
 
1202778< 0.1%
 
1115677< 0.1%
 
485207< 0.1%
 
1265697< 0.1%
 
Other values (12777)1620399.5%
 
ValueCountFrequency (%) 
134921< 0.1%
 
137692< 0.1%
 
138621< 0.1%
 
193021< 0.1%
 
194101< 0.1%
 
ValueCountFrequency (%) 
14904001< 0.1%
 
12105041< 0.1%
 
11177181< 0.1%
 
10478221< 0.1%
 
10245351< 0.1%
 

education
Real number (ℝ≥0)

ZEROS

Distinct16
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.26884098
Minimum0
Maximum15
Zeros456
Zeros (%)2.8%
Memory size127.2 KiB
2021-02-01T13:58:01.933118image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q19
median11
Q312
95-th percentile15
Maximum15
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.882980275
Coefficient of variation (CV)0.3781322821
Kurtosis0.6687252982
Mean10.26884098
Median Absolute Deviation (MAD)2
Skewness-0.9407981888
Sum167187
Variance15.07753581
MonotocityNot monotonic
2021-02-01T13:58:02.294897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
11528332.4%
 
15358722.0%
 
9267016.4%
 
129345.7%
 
86794.2%
 
16373.9%
 
75343.3%
 
04562.8%
 
53091.9%
 
142581.6%
 
Other values (6)9345.7%
 
ValueCountFrequency (%) 
04562.8%
 
16373.9%
 
22241.4%
 
3790.5%
 
41761.1%
 
ValueCountFrequency (%) 
15358722.0%
 
142581.6%
 
13320.2%
 
129345.7%
 
11528332.4%
 

education-num
Real number (ℝ≥0)

Distinct16
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.07290707
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T13:58:02.664583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.567545259
Coefficient of variation (CV)0.2548961527
Kurtosis0.6308129327
Mean10.07290707
Median Absolute Deviation (MAD)1
Skewness-0.3263376289
Sum163997
Variance6.592288655
MonotocityNot monotonic
2021-02-01T13:58:02.997008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
9528332.4%
 
10358722.0%
 
13267016.4%
 
149345.7%
 
116794.2%
 
76373.9%
 
125343.3%
 
64562.8%
 
43091.9%
 
152581.6%
 
Other values (6)9345.7%
 
ValueCountFrequency (%) 
1320.2%
 
2790.5%
 
31761.1%
 
43091.9%
 
52421.5%
 
ValueCountFrequency (%) 
161811.1%
 
152581.6%
 
149345.7%
 
13267016.4%
 
125343.3%
 

marital-status
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.632577851
Minimum0
Maximum6
Zeros2190
Zeros (%)13.5%
Memory size127.2 KiB
2021-02-01T13:58:03.373611image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median2
Q34
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.510611131
Coefficient of variation (CV)0.5738144193
Kurtosis-0.5360876441
Mean2.632577851
Median Absolute Deviation (MAD)2
Skewness-0.02208331243
Sum42861
Variance2.281945989
MonotocityNot monotonic
2021-02-01T13:58:03.675837image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
2740345.5%
 
4543433.4%
 
0219013.5%
 
65253.2%
 
55053.1%
 
32101.3%
 
1140.1%
 
ValueCountFrequency (%) 
0219013.5%
 
1140.1%
 
2740345.5%
 
32101.3%
 
4543433.4%
 
ValueCountFrequency (%) 
65253.2%
 
55053.1%
 
4543433.4%
 
32101.3%
 
2740345.5%
 

occupation
Real number (ℝ≥0)

ZEROS

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.587617468
Minimum0
Maximum14
Zeros966
Zeros (%)5.9%
Memory size127.2 KiB
2021-02-01T13:58:04.260908image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median7
Q310
95-th percentile13
Maximum14
Range14
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.233925111
Coefficient of variation (CV)0.6427096187
Kurtosis-1.239314291
Mean6.587617468
Median Absolute Deviation (MAD)4
Skewness0.1025083781
Sum107253
Variance17.92612185
MonotocityNot monotonic
2021-02-01T13:58:04.614114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%) 
10203212.5%
 
4202012.4%
 
3201312.4%
 
12185411.4%
 
1184111.3%
 
8162810.0%
 
710206.3%
 
09665.9%
 
147584.7%
 
67024.3%
 
Other values (5)14478.9%
 
ValueCountFrequency (%) 
09665.9%
 
1184111.3%
 
26< 0.1%
 
3201312.4%
 
4202012.4%
 
ValueCountFrequency (%) 
147584.7%
 
135183.2%
 
12185411.4%
 
113342.1%
 
10203212.5%
 

relationship
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.437135311
Minimum0
Maximum5
Zeros6523
Zeros (%)40.1%
Memory size127.2 KiB
2021-02-01T13:58:05.000501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.592903257
Coefficient of variation (CV)1.108387808
Kurtosis-0.7250692478
Mean1.437135311
Median Absolute Deviation (MAD)1
Skewness0.8016181691
Sum23398
Variance2.537340786
MonotocityNot monotonic
2021-02-01T13:58:05.352693image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
0652340.1%
 
1427826.3%
 
3251315.4%
 
4167910.3%
 
57634.7%
 
25253.2%
 
ValueCountFrequency (%) 
0652340.1%
 
1427826.3%
 
25253.2%
 
3251315.4%
 
4167910.3%
 
ValueCountFrequency (%) 
57634.7%
 
4167910.3%
 
3251315.4%
 
25253.2%
 
1427826.3%
 

race
Real number (ℝ≥0)

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.672440268
Minimum0
Maximum4
Zeros159
Zeros (%)1.0%
Memory size127.2 KiB
2021-02-01T13:58:05.670467image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q14
median4
Q34
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8403273884
Coefficient of variation (CV)0.2288198928
Kurtosis5.108549205
Mean3.672440268
Median Absolute Deviation (MAD)0
Skewness-2.473237164
Sum59791
Variance0.7061501197
MonotocityNot monotonic
2021-02-01T13:58:05.984455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
41394685.7%
 
215619.6%
 
14802.9%
 
01591.0%
 
31350.8%
 
ValueCountFrequency (%) 
01591.0%
 
14802.9%
 
215619.6%
 
31350.8%
 
41394685.7%
 
ValueCountFrequency (%) 
41394685.7%
 
31350.8%
 
215619.6%
 
14802.9%
 
01591.0%
 

sex
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
1
10860 
0
5421 
ValueCountFrequency (%) 
11086066.7%
 
0542133.3%
 
2021-02-01T13:58:06.310535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

capital-gain
Real number (ℝ≥0)

ZEROS

Distinct113
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1081.905104
Minimum0
Maximum99999
Zeros14958
Zeros (%)91.9%
Memory size127.2 KiB
2021-02-01T13:58:06.636703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile4865
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7583.935968
Coefficient of variation (CV)7.009797753
Kurtosis148.6715143
Mean1081.905104
Median Absolute Deviation (MAD)0
Skewness11.77829263
Sum17614497
Variance57516084.77
MonotocityNot monotonic
2021-02-01T13:58:07.134479image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01495891.9%
 
150241661.0%
 
76881260.8%
 
72981180.7%
 
99999850.5%
 
3103550.3%
 
5178490.3%
 
5013480.3%
 
4386380.2%
 
3325280.2%
 
Other values (103)6103.7%
 
ValueCountFrequency (%) 
01495891.9%
 
1142< 0.1%
 
4013< 0.1%
 
594180.1%
 
9142< 0.1%
 
ValueCountFrequency (%) 
99999850.5%
 
413101< 0.1%
 
340951< 0.1%
 
27828240.1%
 
252363< 0.1%
 

capital-loss
Real number (ℝ≥0)

ZEROS

Distinct82
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.89926909
Minimum0
Maximum3770
Zeros15518
Zeros (%)95.3%
Memory size127.2 KiB
2021-02-01T13:58:07.653287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum3770
Range3770
Interquartile range (IQR)0

Descriptive statistics

Standard deviation403.1052856
Coefficient of variation (CV)4.585991326
Kurtosis19.29718195
Mean87.89926909
Median Absolute Deviation (MAD)0
Skewness4.52064718
Sum1431088
Variance162493.8713
MonotocityNot monotonic
2021-02-01T13:58:08.190820image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01551895.3%
 
19021020.6%
 
1977850.5%
 
1887740.5%
 
2415230.1%
 
1590220.1%
 
1876200.1%
 
1741200.1%
 
1485200.1%
 
1564180.1%
 
Other values (72)3792.3%
 
ValueCountFrequency (%) 
01551895.3%
 
2131< 0.1%
 
3232< 0.1%
 
6255< 0.1%
 
6531< 0.1%
 
ValueCountFrequency (%) 
37702< 0.1%
 
31752< 0.1%
 
30043< 0.1%
 
28244< 0.1%
 
26032< 0.1%
 

hours-per-week
Real number (ℝ≥0)

Distinct89
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.39223635
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T13:58:08.993201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.47933225
Coefficient of variation (CV)0.3089537341
Kurtosis3.016899125
Mean40.39223635
Median Absolute Deviation (MAD)3
Skewness0.2604188513
Sum657626
Variance155.7337333
MonotocityNot monotonic
2021-02-01T13:58:09.565314image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
40758646.6%
 
5014278.8%
 
458935.5%
 
607024.3%
 
356403.9%
 
206383.9%
 
305513.4%
 
553572.2%
 
252841.7%
 
482531.6%
 
Other values (79)295018.1%
 
ValueCountFrequency (%) 
17< 0.1%
 
2210.1%
 
3200.1%
 
4300.2%
 
5350.2%
 
ValueCountFrequency (%) 
99520.3%
 
983< 0.1%
 
964< 0.1%
 
922< 0.1%
 
90130.1%
 

native-country
Real number (ℝ≥0)

ZEROS

Distinct41
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.81033106
Minimum0
Maximum41
Zeros274
Zeros (%)1.7%
Memory size127.2 KiB
2021-02-01T13:58:10.057885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19
Q139
median39
Q339
95-th percentile39
Maximum41
Range41
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7.677427516
Coefficient of variation (CV)0.2085671955
Kurtosis13.27595732
Mean36.81033106
Median Absolute Deviation (MAD)0
Skewness-3.754228123
Sum599309
Variance58.94289326
MonotocityNot monotonic
2021-02-01T13:58:10.519606image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%) 
391466290.1%
 
263081.9%
 
02741.7%
 
30970.6%
 
33700.4%
 
11690.4%
 
2610.4%
 
19510.3%
 
8490.3%
 
3470.3%
 
Other values (31)5933.6%
 
ValueCountFrequency (%) 
02741.7%
 
190.1%
 
2610.4%
 
3470.3%
 
4260.2%
 
ValueCountFrequency (%) 
417< 0.1%
 
40190.1%
 
391466290.1%
 
388< 0.1%
 
37120.1%
 

income
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
0
12435 
1
3846 
ValueCountFrequency (%) 
01243576.4%
 
1384623.6%
 
2021-02-01T13:58:10.827550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2021-02-01T13:56:38.966413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:39.480502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:39.936979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:40.375313image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:40.825767image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:41.328157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:41.743614image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:42.140563image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:42.535394image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:42.976322image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:43.397003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:43.836961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:44.330070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:44.728903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:45.239458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:45.723126image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:46.226970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:46.682548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:47.090372image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:47.510662image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:47.955481image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:48.400800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:48.865391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:49.372818image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:49.812903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:50.240854image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:50.663702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:51.101388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:51.531041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:51.966614image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:52.399403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:52.831237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:53.313485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:53.777176image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:54.265970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:54.750732image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:55.191190image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:55.623118image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:56.054972image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:56.475927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:56.922393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:57.360795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:57.797845image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:58.203355image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:58.628396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:59.097259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:56:59.593694image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:00.015520image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:00.456624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:00.856574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:01.284076image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:01.723445image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:02.131217image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:02.565896image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:03.017146image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:03.450140image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:03.892566image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:04.395731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:04.814628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:05.239252image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:05.648472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:06.110666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:06.552108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:06.988421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:07.400464image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:07.806621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:08.241597image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:08.662829image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:09.308995image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:09.733971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:10.157542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:10.539486image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:10.975223image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:11.377819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:11.815032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:12.236631image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:12.634019image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:13.035933image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:13.437400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:13.881969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:15.607911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:16.060548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:16.488512image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:16.911879image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:17.334400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:17.815470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:18.257884image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:18.776485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:19.258534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:19.676741image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:20.076035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:20.463386image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:20.877482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:21.288458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:21.709965image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:22.130141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:22.548480image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:22.985238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:23.384121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:23.768983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:24.297857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:24.852404image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:25.259712image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:25.706890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:26.158611image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:26.587704image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:27.056784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:27.509772image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:28.292417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:29.138982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:29.878523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:30.458864image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:31.103089image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:32.262083image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:32.866038image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:33.614654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:34.084985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:34.774869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:35.189585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:35.584103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:35.989654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:36.473512image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:36.864080image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:37.270544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:37.657363image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:38.014632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:38.414493image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:38.800717image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:39.337850image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:39.727663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:40.136992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:40.567827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:40.993024image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:41.422233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:41.832540image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:42.734184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:43.295398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:43.719146image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:44.238819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:44.675935image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:45.109433image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:45.633681image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:46.009357image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:46.404960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:46.839605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:47.236229image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:47.654591image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:48.067736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:48.447045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:48.829379image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:49.311407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:49.687237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:50.111262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:50.454638image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:50.821893image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:51.193699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:51.560116image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:51.937014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:52.327933image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:52.722352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:53.121959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:53.478230image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:53.860201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:54.332899image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:54.693242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:55.136496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:55.499557image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:55.914405image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:56.292137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-01T13:58:11.106693image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-01T13:58:11.832912image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-01T13:58:12.571160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-01T13:58:13.284750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-02-01T13:57:57.192998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T13:57:58.305373image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincome
025422680217473210040390
138489814119250410050390
22823369517122110410040391
34441603231510270217688040391
41801034971510403400030390
534419869306481410030390
6290227026119404210040390
763610462614152100413103032391
82443696671510484400040390
955410499654230410010390

Last rows

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincome
16271614896861192120410048390
16272314440129119230410040390
16273254350977119483400040390
162744823492301214081410040390
162753342452119134103410040390
162763942154199130101400036390
16277640321403119602210040390
162783843749839132100410050390
1627944483891913013115455040390
16280355182148913240410060391

Duplicate rows

Most frequent

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincomecount
0185378036284534100103902
124419463091341014100353902
2294364409134114000403902
33041803178110714100403902
4374528709132404100403902